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Progress Summary 

During the period December 23, 1997 and December August 31, 2004, we accomplished 
the development of 2 CFD codes for DNS/LES/RANS simulation of turbine cascade 
flows, namely LESTool and UNCLE. LESTool is a structured code making use of 5 th 
order upwind differencing scheme and UNCLE is a second-order-accuracy unstructured 
code. LESTool has both Dynamic SGS and Sparlart’s DES models and UNCLE makes 
use of URANS and DES models. The current report provides a description of 
methodologies used in the codes. 


1. Introduction 

Flow transition plays an important role in turbomachinery applications. The majority of 
boundary layer flows in turbomachines involve flow transition under the effects of 
freestream turbulence, diverse pressure gradients, wide range of Reynolds numbers, flow 
separation, and unsteady wake-boundary layer interactions. 

Prediction of this type of complex flows is an important element in analysis and 
performance evaluation of gas turbine engine components and ultimately in the design of 
more efficient jet engines. Especially, in low pressure turbine applications prediction of 
transition becomes pivotal in terms of efficiency. For low pressure turbines the flow is 
mostly turbulent at the high Reynolds number conditions encountered at take off and the 
efficiency is at its design maximum. However, at high altitudes and cruise speeds which 
correspond to lower Reynolds number conditions, unpredicted losses and substantial 
drops in efficiency have been observed. These losses are attributed to flow separation on 
the suction surface of the turbine blades. At low Reynolds numbers, the boundary layers 
on the airfoil surface have a tendency to remain laminar and hence the flow may separate 
before it becomes turbulent, causing increase in fuel consumption and drop in efficiency. 
The impact of such losses is directly felt on the operation costs. It has been estimated that 
a 1% improvement in the efficiency of a low pressure turbine would result in a saving of 
$52,000 per year on a typical airliner. 

In order to calculate the losses and heat transfer on various components of gas turbine 
engines, and to be able to improve component efficiencies and reduce losses through 
better designs, accurate prediction of transitional boundary layers is essential. When one 
deals with a complex fluid phenomena like a transition, separation and turbulence, 
several hundred millions grid points are needed to resolve boundary layers and other flow 
structures correctly. We have started to develop technology to make such large scale 
simulations not only possible at supercomputing centers like NCSA or NAS but on 
inexpensive, high-performance clusters of PCs, or “Beowulfs". These clusters are 
specialized for CFD applications, using the novel approach that the hardware, operating 
system, and application code are optimized together rather than separately. A Honorable 
Mention in the Price/Performance Category of the Gordon Bell Prize was awarded for 
this approach at IEEE/ ACM SC2000 Conference on High-Performance Networking and 
Computing. 


Several turbulence test cases have been computed and an overview of the results is given. 


2. Code Descriptions 

(1) A description of LESTool is give in Appendix I. 

(2) A description of UNCLE is given in Appendix II. 
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Appendix I - LESTool 



LESTool: A CFD Program for 3D unsteady flows 


1. LES Turbulence Models 


1.1 LES - Filtered Equations 

Large-eddy simulation (LES) is based on the definition of a filtering operation: a filtered 
(or large-scale, or resolved) quantity, denoted by an over bar, is defined as 

7(x)= jf(x')G(x,x')dx\ 

D 

where D is the entire domain and G is the filter function, which determines the size and 
structure of the small scales. For compressible flow the large-scale equations are 
operationally simpler if Favre filtered quantities are used. A Favre filtered variable is 

defined as / = pf Ip . Applying the spatial filter G to the governing equations leads to 
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where a perfect-gas equation of state is assumed. The filtered stress tensor, r and f jt 
are given by 
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The filtered heat fluxes, q ; = -kdT tdx } and q J = -icdT/dXj . 

All boxed terms must be modeled. The following list defines modeling assumptions for 
all six terms which have to be modeled. 

1 . This first boxed term is the divergence of the subgrid-scale (SGS) stresses, 

T y° S = P( u , u j ~ ufij), and has to be modeled. Note that the trace of the SGS stresses in 

compressible flows cannot be included in the modified pressure and requires separate 
modeling. The modeling will be described in the next section. 

2. This 2 nd boxed term appears because of the nonlinearity of the viscous stresses and in 
invariably neglected. 

3. The SGS heat flux model for qf :s = piu'T -u } T) will be described in the next section. 

4. This term, D } = {u ] u k u k -UjU k u k ), is similar to the turbulence diffusion that appears 
in the subgrid-scale kinetic energy equation. 

5. This term, d(q } -q^t dx J , is neglected because of the same reasons as 2. 

6. This term, d(r ji u i - f Jl u i ) / dx ; , is the viscous SGS work and is neglected. 

To summarize the following quantities will be be modeled in our implementation: 

1 . SGS stresses, t^ gs . 


2. SGS heat flux, q s f s . 


1.2 Dynamic SGS model 

For the left-hand side a trace- free Smagorinsky eddy viscosity model is used for t* gs . 

However the eddy viscosity coefficient will be a function of the instantaneous flow 
variables. Let 




where S', is the trace-free rate of the strain tensor 
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is the norm of S t< 




The isotropic part of the SGS Reynolds stress tensor q 2 = t kk has to be modeled 
separately. It is parameterized by the expression: 

q 2 = 2C,pA 2 

To compute C, , the following equation is used: 


pu,pUj ^ 

pu i u j ^ =2C, 

( 

A a2 

P A 

S 

2 

-A 2 p 

S 

2 ^ 

P 





) 


Germano et al. [5] showed that expressions similar to the ones multiplying C, can 
become zero. Therefore an averaging procedure is needed to make the determination of 
the SGS coefficients well conditioned. It is assumed that C t is independent of the 
directions in which the flow is homogeneous. Volume averaging leads to 
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To obtain C, we use the following model 
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Contracting of M tJ which was recommended by Lilly is outlined below. Since equation 
for L tJ represents six independent equations in one unknown, its error can be minimized 
by applying a least squares approach. Define G as 
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By setting dG/dC = 0 , C is evaluated as 



where o denotes a spatial averaging operation. 


1.3 DES model 


The DES turbulence model is based on the blending of RANS and LES to provide a 
model that can accurately predict unsteady flows without requiring high-precision grids 
near the walls. A RANS turbulence model is used to simulate the turbulence within the 
boundary layer, gradually subsuming into the LES model as one moves away from the 
wall. Further details on this technique can be found in [15], [14] and others. The RANS 
turbulence model used in LESTool is the one-equation Spalart-Allmaras model, 

where 
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with the coefficients given below 

c bx =0.1355, c hl =0.622, c„, =7.1, c u2 =5.0, c wl =c bl / k 2 +(l + c b2 )/<r, 
c w2 =03,c w3 =2,«t = 2/3,x- = 0.41,Q = |Vxm|, 

and d is the distance to the nearest wall. Note that this formulation does not include the 
trip functions. Large values of r can be truncated at 10 or so. The turbulent viscosity for 
the momentum equations is found from v t = f vX v . For the DES approach the length scale 
d is changed into 

d ’ = min (d, max( Ax, Ay, A a)C t ) 


with C i = 0.65 . This gives the turbulence model behavior near the wall and LES type 
behavior away from the wall 


2. Numerical method 


The compressible Navier-Stokes equations are discretized using an iterative diagonal 
dominant ADI (DD-ADI) algorithm that can be written in the following form: 



[d l + a tL\ + A tl\ + A tl\ ]su = RHS h 

The right-hand side is approximated by the newly develop ENO-Pade method [17] for the 
convective flux and sixth-order central for the diffusive terms. The basic algorithm is the 
following. First the flux at each cell face is computed as shown below 

Quantities with the superscript H are computed using a 8th-order central interpolation 
method. For the quantities with the superscript R or L the standard ENO-interpolation 
[6]. Than the flux is differentiated using a cell centered Pade formula [10]: 

h 3 h 

The left-hand side is approximated by a lower-order method. The high order solution can 
be achieved by performing inner iterations since the order of accuracy is not affected by 
low order treatment of the left-hand side. 

The proposed algorithm has the following form: 

(D L + a tL L x )d(su')= d l [(jf/ 3 > m -' -(<?[/' y”-'] 

(. D L + A tL L y )d(SU 2 )= D l [(SU') m -(SU 2 )”-'] 

(D L + A tL\ )d(5U 3 ) = D l [( SU 2 ) m - (SU 3 )"-' ] 

This algorithm is easily parallelizable because we loop trough a three dimensional array 
plane by plane. The outline of the algorithm is the following: 

1 . for each k: solve equations in i and j planes. 

2. for each j: solve equations in k and i planes 

3. for each i: solve equations j and k planes 

This means after completion of the above algorithm we have already completed two 
iterations. From our experience three to four iterations are needed for each time-step to 
reach the required accuracy. 

Fortran 90 was chosen as the programming language for this project because of the new 
powerful array syntax, the addition of derived types and the module concept. The code 
has been developed based on an object-oriented methodology to simplify the transition 
from the Cartesian implementation to an implementation for general coordinates and 
chimera-type overlapping grids. The new features of Fortran 90 like abstract data types or 
generic programming support an object-oriented programming style even though it is not 
designed to be an object-oriented language. 


3. Parallelization 


3.1 Shared Memory parallel approach 

The LESTool code was parallelized in a straightforward manner: The code was compiled 
with the -mp option to enable parallelism in the compile phase. This was combined with 
placing OpenMP compiler directives, which instruct the compiler to generate code that 
will execute in parallel. 

These directives were placed at key spots within the code for the greatest parallel 
efficiency. This resulted in a decomposing of the 3D problem into groups of ID lines, 
with each group assigned to a dedicated processor. 

The efficiency of the parallel decomposition was enhanced by the use of the first touch 
policy which is specific to the SGI Origin 2000. This means that memory allocated is 
physically placed on the processor which touches the memory location first. This means 
that all large three dimensional blocks of memory where initialized in planes of constant 
k to get a memory distribution which is favorable for our algorithmic design (see figure 
1 ). 
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Figure 1 : Partitioning of the data along the k-axis 

Since we target shared memory computers there is no explicit data redistribution needed. 
Instead we split up the k -sweeps among the processors by partitioning work along the y 
-axis (fig. 2). This is much simpler and easier to implement compared to the distributed 
memory concept. 
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Figure 2: Sweep along the zx -planes 





3.2 Distributed memory approach 

In order to achiever better portability and performance on a wider range of computational 
platforms a distributed approach was also implemented. The parallel structure of the 
distributed version of LESTool is based on splitting the computational grid into sub- 
blocks, which are then distributed to each processor (figure 3). For a complex geometry, 
this is a nontrivial exercise where an uneven load-balance across different processors 
could significantly reduce the computational efficiency of the overall code. The 
partitioning of the grid into sub-blocks is performed independently of the grid generation, 
using virtual partitions to define the domains corresponding to each processor. The 
current algorithm governing the partition process is based on spectral recursive bisection 
in conjunction with a small database of the computational speeds associated with each 
node. The splitting is performed as a preprocessing task, generating a file that maps the 
grid sub-blocks onto the processors. This file is the only difference between performing a 
serial computation and a parallel computation on a distributed machine. 

Grid block 



virtual block boundaries 



Figure 3: Communication pattern for the distributed approach 





Communications between the grid sub-blocks occurs when the sub-blocks exchange data 
about the flow variables at the boundaries. As show in figure 3 , the flow variables on the 
edge of one grid block are communicated to the dummy points of the neighboring grid 
block, and vice versa. LESTool requires such a communication step after each update, or 
sub-iteration, of the flow variables. The low-level implementation of the communication 
between the sub-blocks uses a MPI-based communications system. The communication 
model is a mailbox algorithm where the data is sent in a non-blocking communication 
mode as early as possible. 

4. Accuracy and Performance of LESTool 
4.1 Accuracy of the ENO-Pade method 

In a previous proposal a fifth-order upwind scheme has been proposed to simulate flow 
around a low pressure turbine. This method showed too much numerical dampening so 
that the transition and separation on the low-pressure turbine couldn't be simulated. In 
order to develop a code suitable for transition and turbulence a new numerical method - 
the ENOPade method - was developed [17]. This method combines the stability of the 
ENO scheme with the high-order accuracy of the Pade method. Several test cases have 
been computed and the damping of the new scheme is much lower as you can see below 
from one test case. This case consists of the unsteady advection of a vortex in a uniform 
flow. Here the capabilities of the different numerical schemes to accurately advect the 
vortex structures are tested which is critical to LES/DNS simulations. 



Figure 4: Contour lines of vorticity magnitude at three time intervals of ENO-Pade 
method. 



Figure 5: Swirl velocities of the vortex along y=0 at T=12 




4.2 Performance Results 


4.2.1 Single processor performance tuning 

The key point on cache based architectures to achieve high floating point performance is 
cache-awareness of the algorithm. This stage of tuning is very important because efficient 
utilization of the cache architecture assures good overall performance. Single processor 
performance is the basic step to get good overall performance for the parallel 
computation. 

During the development of the code special care was taken by placing variables which 
are used concurrently in the calculation close together in the main memory. One major 
detail is that the first index in the data arrays is used for addressing the different 
components of global variables, such as conservative variables. This is contrary to a 
vector architecture, where the index for a variable is normally addressed by the last index 
in an array. The data is copied into one-dimensional scratch arrays which fit into the 
second level cache. The main algorithm is performed on these scratch arrays which have 
a memory layout and access pattern (stride one) optimal for the cache architecture. This 
measure alone resulted in a floating-point performance of 60 MFLOPS. 

To achieve further speedup of the code, the main bottle necks identified were the 
tridiagonal and block-tridiagonal matrix solvers. To increase the utilization of memory 
bandwidth, all arithmetic operations involving the 5x5 block matrices, used in block 
tridiagonal and periodic block tridiagonal solvers, were unrolled. This resulted in a much 
longer and less desirable code, but the performance results outweigh this disadvantage. 
The LESTool code achieves a floating-point performance of about 120 MFLOP/s on a 
1 80 MHz Origin 2000. We consider this a very satisfactory performance. 

Single processor run times for a single test case sufficiently large so that it does not fit in 
the L2 cache of all our CPUs has been chosen. The runtime of the code / time step is 
reported in table 1 . The LosLobos results (733 MHz Pentium III) are much worse than 
the Athlon results - this is probably related to (2 computations/2 data transfers per cycle). 
However, we are not absolutely confident that this is the optimal possible performance of 
LESTool on LosLobos, and as such subsequent results will emphasize the KLAT2 and 
KFC1 outcomes. 

Table 1 : Execution time in seconds on different Linux platforms for a single time step for 
the 64 3 test case 



Compiler 

uoubie Precision 

Cust/proccsscr 

$/TSpS 

700 MHz Athlon 

Gcc-3.2 

37s 

$625 

3030 

733 MHz 
Pentium III 

Egcs-2.9.1.66 

123s 

$2930 

18170 

1 .5GHz Athlon 
MP 

Gcc-2.96 

22s 

$750 

2240 



4.2.2 OpenMP performance tuning 

In figure 6, the same test case with the improved fine-grain parallelization is shown. The 
removal of the barriers gives a much better speedup and provides a better performance of 
the fine-grain parallelization. The program scales better for larger number of shared 
processors. However, there still seems to have a limit on the number of the processor 
used for the fine-grain parallelization. Since the memory is distributed in planes of 
constant k over the available processors using more than 32 processors is no longer 
advantageous. By choosing a much larger test case of 256 3 grid points, as also shown in 
figure 6 by the dashed line, it can be seen that because of a larger computational load per 
processor the performance scales much better. For this case, the program performs well 
up to 64 processors. 



Figure 6: Speedup with reduced synchronization and orphaned OpenMP directives 

4.2.3 Parallel performance 

In order to achieve a reasonable throughput the wall time for our computations with 
LESTool must be at most on the order of 10s / time step. A comparison of the wall times 
for the three different test cases follows 

In figure 7 the wall times are compared for KLAT2 and KFC1. The performance of 
KFC1 is obviously better than that of KLAT2 because of the higher clock frequency. The 
clock frequency of KFC1 is double than that of KLAT2 but the speed of LESTool on 
KFC1 is obviously not twice compared to that on KLAT2. Here, the memory system 
piays an important role as well— KFCi has a PC21000 memory system, whereas KLAT2 
has a PCI 00 memory system. Our goal to reach a time of less than 10s / time step is 
already reached on two processors on KFCI , while for KLAT2 it takes about six 
processors to come into the same total performance range 





processors 

Figure 7: Wall clock time on KLAT2 and KCF1 for the 64 3 case 

The 1 28 3 case is much more demanding. In this case it takes about ten nodes or twenty 
processors on our dual processor machine KFC1 to obtain a wall clock time under 10s. 
For KLAT2 forty processors are needed to obtain the desired computational speed. Note 
that for this example KFC1 does achieve twice the speed of KLAT2. 



processors 

Figure 8: Wall clock time on KLAT2 and KCF1 for the 128 3 case 

The 1 96 3 case severely stresses our cluster computers. Here we cannot reach the 1 Os 
mark on either of the clusters. For KFC1, we can achieve less than 20s / time step on 36 
processors, or twice tl chievable is 22.7s. 



processors 

Figure 9: Wall clock time on KLAT2 and KCF1 for the 196 3 case 



KFC1 has a three-way channel-bonded network. The code scales well for all three test 
cases as shown in figure 10. Note that we could run the single processor version for the 
128 3 but the performance was so low because of swapping that we decided to scale the 
larger cases to the minimum number of processors on which the case could reasonable 
run. Note that the sizeable drop in performance for 34 processors in all three cases. This 
is related to poor load balance. The only way we can divide our grid is in two slices in j- 
direction and seventeen slices in the k-direction. Given the cubical symmetry of this 
problem, the best subdivision is a equal number of cuts in the i, j, k direction. An 
example for an optimal partitioning is the case for 16 processors. Each sub-block consists 
of 1 6 3 grid points. This example shows the importance of the partitioning on the parallel 
performance of LESTool. 



Figure 10: Speedup on KFC1 for the 64 3 , 128 3 and 196 3 cases 

KLAT2 has a FNN network architecture which is optimized for next neighbor 
communication. LESTool scales even better on KLAT2 than it does on KFC1 for all 
three test cases as shown in figure 1 1 . Note that a similar effect caused by uneven load 
splitting can be seen on this machine as well. 



Figure 11: Speedup on KLAT2 for the 64 3 , 128 3 and 196 3 cases 
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5. DNS of homogeneous turbulence 


The Direct Numerical Simulations (DNS) of decaying isotropic turbulence were 
presented by Blaisdell et al. [2]. The BlaisdelTs DNS initial spectrum of case ia64f was 
A conditions generated using Crecomp [4] are used as the input for LESTool. The 
velocity fields needed for LESTool are given by Crecomp in such a way that irrespective 
of the grid density, the energy spectrum produced has the same energy for all the initial 
spectra. The rms velocities are taken as the input decides the total energy under the curve. 
This energy is divided across the range of wavelengths available. 



Figure 12: (64 3 grid) Comparison of energy spectra for various values of Cdes with 
Blaisdell’s spectra at t=7 

In the above figure, the Blaisdell’s DNS result is indicated by the solid black 
curve running all the way down. All the other curves with different symbols correspond 
to simulations run using LESTool for various values of grid sensitivity parameter Cdes 
considered. It can be seen that when a 64 3 grid is used, the No Turbulence Model matches 
best with the Blaisdell’s result. When we look at the DES results, the curve with 
Cdes=0.01 almost merges with the No Model case and Blaisdell’s curve. This somewhat 
confirms that the LESTool has generated a DNS solution as did in Blaisdell’s 
simulations. 


6. DNS of turbulent channel flow 


The results presented in this paper were based on a direct numerical simulation of a fully 
developed channel flow reported in [7]. The flow has two periodic (x and z) directions 
and the solid wall condition was imposed in the y direction. The Reynolds number based 

on the wall shear velocity, u t = ^r w ip , is Re r = p w u t II / p w =180, where H is half the 

channel width. The streamwise and spanwise dimensions of the channel are AnH and 
2nH , respectively. The computation is carried out on a grid using 200x121x200 grid 
points in x , y , and z , respectively. The flow field is initialized with a laminar solution 
and random fluctuations are superimposed on the pressure field. The governing equations 
are then integrated in time until a statistical equilibrium is reached (tu T i H > 30 ). 



Figure 1 3 shows the dimensionless velocity, u + =u/u T , plotted against dimensionless 
wall distance, y + = yu f / v . Also shown in this figure is the comparison of the predicted 
mean velocity profile against the data cited in [8] for the same Reynolds number. The 
dotted line represents the desirable profile in the viscous sublayer, u* = y* , and the 
dashed line denotes the log-law line, u* = In y + / 0.41 + 5.2 . The figure shows that the 
current mean velocity profile matches the data of Kim et al. [8] very well. The 
comparison of the rms values as shown in figure 14 of the velocity shows good 
agreement with the data of Kim et al. 



Figure 13: Mean velocity profiles, solid line, LESTool; circled symbols, DNS of Kim et 
al.; thin dashed line , viscous sublayer; tick dashed line , log layer. 



Figure 14:Comparison of RMS values 



7. Simulation of a flow over a cylinder 


A more complex basic flow is flow over a cylinder. This is an informative test case for 
investigating DES, as many of the flow characteristics that have been measured in 
experiments are not captured by traditional RANS techniques. A critical consideration 
when working with DES is proving to be grid construction - fundamentally, the local grid 
dimensions control the location of the transitional region where DES switches from RANS 
to LES mode. Strelets discusses the challenge of grid design in the conclusion of his 2001 
paper [16]. Howevere,, it is not yet clear if there is in general a truly grid-independent 
solution with DES. Rather than asymptotically approaching a single, grid-independent 
result, sufficiently fine grid outputs might only be confined to a definable range of 
solutions. This presents an additional challenge to the validation process for Detached- 
Eddy Simulation and expands considerably the number of simulations necessary to 
concretely define the advantages and limitations of DES. In figure 15 lines of the vorticity 
are shown. This shows that the shear layer starts to become turbulent. Further 
calculations are needed to determine the influence of grid resolution on the performance 
of the DES mode. 



Figure 15: Vorticity contours of flow around cylinder 

8. PaK-B blade cascade simulations 


In the current study, the cascade flow is simulated for the PakB blade. The grid resolution 
is 200x100x100, shown in Figure 16. The blade Reynolds number in the simulations is 
25,000. This is typical Reynolds number for high altitude aircraft engines. This case is 
perfect for the application of Large-Eddy simulation (LES) or Detached Eddy simulation 
(DES) since the Reynolds number is relatively low, so the resolution requirements are not 
as demanding as for the design point. A large separation region is visible starting at about 
60% of the axial cord and extending to the end of the blade. The flattened region or plateau, 


between 60% and 90% of the axial cord, shows the region of boundary layer separation. 
The numerical results of this study are compared with the experimental case 

of Bons [3]. 



Figure 16: 3D view of the grid. 

In the inflow plan (grey plane in figure 16) a random pressure field is prescribed to 
simulate the unsteady turbulent inflow. The blade, shown in red has wall boundary 
conditions and the green plane shows the outflow condition. The simulation is initialized 
with a homogeneous turbulence field as discussed in Chapter 6. 

In figure 17, the averaged solution of the velocity filed in the mid plane is shown and the 
comparison of the time-averaged pressure coefficient is shown in Figure 18. The 
comparison with the experimental data seems very good. 



i i t 


Figure 17 the time-averaged solution of the velocity vector at the center plane. 




Figure 18: C p comparison of calculations with design curve and experimental values. 


References 


[1] Donald J. Becker, Thomas Sterling, Daniel Savarese, John E. Dorband, Udaya A. 
Ranawak, and Charles V. Packer. Beowulf: A parallel workstation for scientific 
computation. In Proceedings, International Conference on Parallel Processing, 1995. 

[2] G. A. Blaisdell, N. N. Mansour, W. C. Reynolds, Department of Mechanical 
Engineering, Stanford University, “ Numerical Simulations of Compressible 
Homogeneous Turbulence”, Report No. TF-50, January 1991. 

[3] J.P. Bons, R. Sondergaard, and R.B. Rivir. Turbine separation control using pulsed 
vortex generator jets. In Proceedings of TURBOEXPO 2000: International Gas Turbine 
Conference, number 2000-GT-0262, pages 1-10, Munich, Germany, May 2000. 

[4] Sai Kumar Doddi, “Validation of Detached Eddu Simulation Using LESTool for 
Homogeneous Turbulence, “ MS thesis, Department of Mechanical Engineering, 
Univresity of Kentucky, 2004. 

[5] Massimo Germano, Ugo Piomelli, Parviz Moin, andWilliam H. Cabot. A dynamic 
subgrid-scale eddy viscosity model. Phys. Fluid, A3, 07 1991. 

[6] A. Harten, B. Enquist, S.Osher, and S. Chakravarthy. Uniformly high order 
essentially nonoscillatory schemes iii. Journal of Computational Physics, 71 :23 1-303, 
1987. 

[7] Th. Hauser, T. I. Mattox, Jr. R. P. LeBeau, H. G. Dietz, and P. G. Huang. High-cost 
CFD on low-cost PC clusters. In Proceedings of ACM/IEEE SC2000, Dallas, Texas, 
November 2000. 

[8] J. Kim, P. Moin, and R. Moser. Turbulence statistics in fully developed channel flow 
at low Reynolds number. Journal of Fluid Mechanics, 177:133-166, 1987. 

[9] J.P. Lake, P.I. King, and R.B. Rivir. Low reynolds number loss reduction on turbine 
blades with dimples and v-grooves. In AIAA-00-0738 , 2000. 

[10] S.K. Lele. Compact finite difference schemes with spectral-like resolution. Journal 
of Computational Physics, 103:16—42, 1992. 

[11] R.E. Mayle. Two-equation eddy-viscosity turbulence models for engineering 
applications. Journal of Turbomachinery, 1 1 3:409-537, 1991 . 


< 



[12] R. Narasimha. Post-workshop summary. In MinnowbrookII-1997 Workshop on 
Boundary Layer Transition in Turbomachines, edited by J.E. LaGraff and D.E. Ashpis, 
1997. 

[13] R.B. Rivir. Transition on turbine blades and cascades at low Reynolds numbers. In 
AIAA-96-2079, 1996. 

[14] M. Shur, P.R. Spalart, M. Strelets, and A. Travin. Detached-eddy simualtion of an 
airfoil at high angle of attack. In W. Rodi and D. Laurence, editors, 4th Int. Symp. Eng. 
Turb. Modelling and Measurements, Aiaccio, France, 1999. Elsevier. 

[15] P.R. Spalart, W.-H. Jou, M. Strelets, and S.R. Allmaras. Comments on the feasibility 
of les for wings, and on a hybrid rans/les approach. In C. Liu and Z. Liu, editors, 1st 
AFOSR Int. Conf. on DNS/LES, Advances in DNS/LES. Greyden Press, Columbus, OH, 
1997. 

[16] M. Strelets. Detached-eddy simulations past a circular cylinder. In Proceedings of 
39th AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV, 2001 . AIAA. 

[17] Z.Wang and P.G. Huang. An essentially non-oscillatory high-resolution pade-type 
(eno-pade) scheme. In 38th Aerospace Sciences Meeting & Exhibit, number AIAA 2000- 
0918. AIAA, January 2000. 


Appendix II - UNCLE 


A Center Pressure Based Method for Two/Three-Dimensional Unstructured 
Incompressible Navier-Stokes Solver 


Abstract 


A center pressure based method is presented in this paper, and which has been implemented 
into a new two/three-dimensional parallel unstructured CFD code, UNCLE, which is developed 
at the University of Kentucky to meet the challenges of physical problems with complex 
geometries and complicated boundary conditions while maintaining high computational 
efficiency. Good load balancing across computational nodes is achieved by using METIS. In 
order to demonstrate the accuracy and performance of center pressure based method, several test 
cases are presented for validation such as two-dimensional incompressible flow past a flat plate, 
two/three- dimensional driven cavity flow, and two/three- dimensional flow over a circular 
cylinder. Notably, an extensive qualitative and quantitative study of two-dimensional flow over a 
circular cylinder for low Reynolds number is also presented in this paper. 

1 Introduction 


Continual improvements in computer technologies and computational fluid dynamics 
(CFD) algorithms have established CFD codes as a reliable tool for fundamental research or 
industrial applications. To deal with increasing grid sizes and demands for faster output, parallel 
computation of CFD has become a standard approach. To deal with the different challenge 
presented by some physical problems with complex geometries and complicated boundary 
conditions is now approached through unstructured CFD grids due to their ability to smoothly 
conform to complicated boundaries. However, combining unstructured grids with a parallel code 
presents still other challenges, such as achieving well-balanced grid decomposition on a 
distributed system and efficient parallel performance. In order to meet these challenges, a center 
pressure based method has been implemented into a new parallel unstructured CFD code called 
UNCLE, which has been developed at the University of Kentucky. UNCLE is designed to meet 
the challenges of using unstructured grid codes on high-performance parallel computers. It is a 
two/three-dimensional finite volume unsteady incompressible Navier-Stokes solver with center 
pressure based SIMPLE algorithm with second order accuracy in both time and space. To 
increase flexibility in complex geometries, center pressure based method is extended to use a 
variety of grid types, such as triangular, quadrilateral, tetrahedral, and hexahedral meshes. To 
obtain good load balancing across computational nodes, METIS [1] is applied for domain 
decomposition. METIS is a set of programs for partitioning graphs and finite element meshes, 
and for producing fill-reducing orderings for sparse matrices. The algorithms implemented in 
METIS are based on multilevel graph partitioning schemes. The key features of METIS include 
extremely fast partition, high quality partitions, and low fill orderings. The parallel construction 
of UNCLE is based on message passing interface (MPI) protocols and has worked successfully 
on systems ranging from commodity PC clusters up to traditional supercomputers. In order to 
demonstrate the accuracy and performance of center pressure based method, several test cases 
are presented for validation such as two-dimensional incompressible flow past a flat plate, 


two/three-dimensional driven cavity flow, and two/three-dimensional flow over a circular 
cylinder. 


2 Numerical Methods 


A center pressure based method for two/three-dimensional finite volume unstructured 
incompressible Navier-Stokes solver for steady/unsteady flow fields is presented in this paper. It 
is center pressure based SIMPLE algorithm with second order accuracy in both time and space. 

In order to compute numerical flux on interfaces, a second order upwind scheme is adopted to 
compute advection terms and second order central difference scheme is used for diffusion terms. 
Non-staggered grids with the Rhie and Chow momentum interpolation method [2] is employed 
to correct the checkerboard solution in the SIMPLE scheme. 

2.1 Governing equations 

The governing equations for unsteady incompressible viscous flow under the assumption of no 
body force and heat transfer are as below. 


Conservation of Mass 

~\pdV = -<^pu l n l dS 


( 1 ) 


Conservation of Momentum 

\p u , dv = ~'{p u , n , u J ds ~ , {p r ' I d S + ^.r IJ n t dS ( 2 ) 


Conservation of Energy 

— JpEdV = -<^pu i n l EdS - ^pu jTijdS + j r i; n t dS 0) 

3t y 

where p is density, p is pressure, u, is component of velocity vector, n, is unit normal vector of 
the interface, Ty is tensor of shear force, and specific internal energy e = e + j(u 2 + v 2 + w 2 ) . Notably, 
density is constant for incompressible flow. 


2.2 Convective and diffusive fluxes 


Figure 1(a) shows the schematic diagram for integration area for convective fluxes. By using 
Taylor series expansion, flow properties on the interface can be obtained by Eq. (3). 


d<j> 




dy 


(^) + | {z,-z Pi ) + HOT 
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8<j> 


(3) 




where </> stands for the velocity components and pressure, the superscript RHS and LHS denote 
the approximation from the right-hand side and left-hand side of the interface respectively, and 
HOT represents higher order terms. By substituting Eq. (3) into Eq. (4), interfacial flow 
properties ^/can be obtained. 

t, +t u,s )- l -sign(l,m)(r S -r s ) (4) 

The gradients at the nodal points (cell centers) are evaluated by the Gauss’s divergence theorem 
as below. 


[ ?£-dV= Un,dA 
rdx , J 


dy „ j~i 


(5) 


dx, V 

where N/ ace is the total number of interfaces of the cell and V denotes the volume of the control 
volume cell. 


The schematic diagram for diffusive fluxes is shown in Fig 1(b). The gradients at the interface 
can be evaluated by using Chain rule as Eq. (6). 

dl = d£dl d^dti 3$_c>4_ 

dx 34 dx drj dx 84 dx 

dl = dld4_ d^dji d^d£ ( 6 ) 

dy 84 dy drj dy d<4 dy 

d<t>_ _ 8£d4_ d^dji 8£8£ 

dz 84 dz dtj 8z 84 dz 

where the local coordinate system (4,t],4) is defined by the type of mesh separately. 


For triangular mesh in Fig. 1(b), 4, is the vector form nodal point Pi to P 2 , 77 is the vector from 
vertex Vi to V 2 , and Q is the integration area for diffusive fluxes. The diffusive fluxes can be 
approximated by Eq. (7). 


f£) -*,)] 
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where (j>p l denotes the properties at nodal points and <f>vi denotes the properties at vertices. 

The values of vortices are obtained by averaging surrounding nodal value, in which inverse 
distances from all surrounding nodal points are considered as weighted function. 


2.3 Center pressure based SIMPLE algorithm 

By using an initial pressure field, P", we can obtain u", v n , and w" by solving the momentum 
equations in an uncoupled form. The momentum equations can be written in the form as Eq. (8). 


( 8 ) 
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where the coefficients a„b and a c are 

a„ b = ma x(-m f ,0) + p f (£«, + 4/h. + 4z*h) A ’ 

A = Za»’ 

nb 

and the RHS term can be written as 
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where subscript c denotes the cell we are solving, subscript nb denotes the neighbor cells, and A 
denotes the interfacial area. In this paper, we solve Eq. (8) by using Gauss-Seidel method. Then, 
we can obtain u , v , and w by Eq. (9). 

u =«" + Aw 
v* = v" + Av 


w' = w n + Aw 

Although at this stage u*, v*, and w satisfy the momentum equations, they do not necessarily 
satisfy the continuity equation. In order to satisfy the mass conservation, one has to interpolate 
the velocity to the interface. 

However, this interpolation will lead to the checkerboard solutions. In order to avoid the 
checkerboard solutions, one has to allow the interfacial velocity to be driven solely by the 
pressure difference. To achieve this aim without sacrificing the accuracy, one can divide the 
interpolated interfacial velocity into two components: one is the velocity component without the 
pressure contribution and the other is solely the pressure contribution. The former, which is at 
the cell center, can be written as: 


. dp n V c 
dx a c 
. dp" V 
dy a c 
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dz a c 

The latter is obtained directly from the pressure difference of the two adjacent nodal points, Pi 
and P 2 such that the interfacial velocity can be expressed as: 
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Where f/and aj are obtained by interpolation to the interface. 

We further assume that there are corrections to u / , v/ , and w/, such that the continuity equation 
can be satisfied by using Eq. (12). 


^ p[{u f + Au' f )«, +( v f + Av) )n 2 + (w) + A w) )n 3 ]A = 0 (12) 

#=1 

We can rewrite Eq. (12) as 

pfAu^n, +A v' f n 2 + w', n 3 ] A = -^pjiv^n, +v’ / n, + w^n^A (13) 

1=1 ;=1 

where the right-hand side in Eq. (13) represents the mass imbalance in the control volume cell. 
One assumes there is a corresponding pressure correction field, p \ which drives the velocity 
corrections according to: 
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By substituting the velocity correction equations into the equation for the mass imbalance, we 
can obtain the equations of the pressure correction: 

a cPc ^J^a^p'nt + b 

nb 

were a„* and a c in the continuity equation are 

= Pi — +— £,«2 +— ’ 

fl / «/ a / 


and 
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Once the pressure correction is obtained, one can update the pressure field by: 

p" +1 = p"+a p p' (16) 

where a p is the under-relaxation factor for pressure and is generally with a value of 0.5-0.8. Then 
the velocity correction on the interfaces as well as nodal points will be updated. 


2.4 Partitioning approach 


Figure 2 shows the schematic diagram of partitioning approach for center pressure based method, 
which is also implemented into UNCLE. In Fig. 2, blue points indicate vertices, red points 
indicate nodal points, and white points indicate the boundary points. By using this approach, the 
control volumes on the boundary are not split. Only communication of nodal values is needed for 
parallel computation which makes the implement of MPI in an unstructured grid more 
straightforward. 

Excellent load balancing between the subgrids on each node is achieved through using 
METIS for domain decomposition. METIS can partition an unstructured grid into any integer 
number of zones without losing load balance. It is compatible with many platforms, convenient 
for running CFD codes on a variety of supercomputer to cluster architectures. Present 
partitioning approach has been tested by a number of two/three-dimensional geometries. All 
results show good load balances. In order to demonstrate the capability of this partitioning 
approach, this paper will present two cases — one is the grid for two-dimensional flow over 
circular cylinder in triangular mesh, and the other is the grid for three-dimensional flow over a 
circular cylinder in tetrahedron mesh. The definition of load-imbalance rate Limb and load- 
balance rate Lb in this paper is defined as Eq. (17) and (18). 

xioo% (I 7 ) 




N 


Lb^i-Lm, (18) 

where N„ 0 d e is grid size of the node and N mg is the average grid size. By using load balance rate, 
we can compare the load balance quantitatively. 

Figure 3(a) shows the partitioned grid for 2D flow over a circular cylinder in triangular mesh. 
The number of total grid points is 51,363 and the number of total cells is approximately 0.1M. 
The grid is partitioned to 1 6 zones for parallel computation. The cell distribution is not uniform, 
denser near the cylinder and coarser away. The load-balance distribution on each node is shown 
in Fig. 3(b). The x-axis indicates the node number and the y-axis indicates load-balance rate. The 
resulting load-balance rates are very close to 100% on every node with an average load balance 
rate of 98.37%. Figure 3(c) shows a partitioned grid for three-dimensional flow over circular 
cylinder with the internal grid distribution visible in a cut-away. The number of total grid points 
is approximately 0.3M and the number of total cells is approximately 1 .3M. In this case, the grid 
is partitioned to 32 zones. The average load-balance rate is 97.9%. Our test results show that 
present partitioning approach has excellent load balance in two/three-dimensional grids with 
various types of meshes by using METIS for domain decomposition. 


2.5 Time discretization 


In this paper, a second-order fully implicit scheme is employed for the temporal discretization. In 
here, we take a one-dimensional equation example: 

3^-4 ^ , a/(r') 0 (19) 

2 At dx 

where <f> is primitive variable, / is interfacial flux, and the superscript n indicates the index in 
time. A deferred iterative algorithm is employed to obtain </> n+I by substituting (20) into (19), 

( 1 f +l ) nt ' = if" )” + (A </>) m (20) 

where the subscript m stands for the sub iteration level. The final equation is 


( 21 ) 
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The right-hand-side of Eq. (21) is explicit and can be implemented in a straightforward manner 
to discretize the spatial derivative term. The deferred iterative algorithm is strongly stable, and 
the solution f +I is obtained by using inner iterations to reach the convergent solution of the 
right-hand-side of Eq. (21), which means A<f> is approximate to zero. A sub-iteration is performed 
at every time step so that this method is fully implicit. 

3 Results 


3.1 Two-dimensional laminar incompressible flow past a flat plate 

In this case, two-dimensional laminar incompressible flow past a flat plate is simulated. The 
schematic diagram including geometry and boundary conditions is shown in Fig. 4. A uniform 
free stream boundary condition is imposed on the inlet. Because of the viscous effect, the 
laminar boundary layer begins to grow at the position x=0 on the plate. The initial condition for 
the entire computational domain is uniform free stream. A quadrilateral mesh of 600x200 is used 
for both Reynolds number (Re) at 5,000 and 50,000 based on a non-dimensional length scale of 
unity. The minimum distance from the wall is 5xl0* 5 (corresponding to y + =0.1) which is fine 
enough to capture the phenomena within the boundary layer. In this case, the computational 
domain is divided into 16 zones. The parallel computation is performed on KFC3a, a 16 nodes 
PC cluster with Pentium IV 2.4 GHz CPU and gigabit network developed at the University of 
Kentucky. The results of this computation are compared to the standard Blasius solution for a 
laminar flat-plate boundary layer. Figure 5(a) shows the rj = yJ Re^/x versus w/U plot at x=4.5 

from present results for Re=5,000 and 50,000 in comparison with Blasius solution. Both present 
results are good agreement with Blasius solution. Figure 5(b) shows the plot of momentum 
thickness (Ree) versus coefficient of skin friction (c/) with present results and Blasius solution. 
Good agreement is obtained by comparing our present results with Blasius solution. 

3.2 Two-dimensional driven cavity flow 

In this section, two-dimensional incompressible flow in a square cavity at a Reynolds number of 
400 is simulated. The fluid in the cavity is driven by a moving top with constant speed. Because 
driven cavity flow lacks an exact solution, an existing accurate numerical solution for this 
problem is used as a benchmark for comparing our results. Ghia et al. [3] presented numerical 
studies using the vorticity-stream function formulation for solutions up to Re= 10,000 with 
257x257 grid points, and these simulation results have been widely used as a benchmark for the 
driven cavity problem. The schematic diagram of this case with geometry and boundary 
conditions is shown in Fig. 6. The initial condition for the entire computational domain is 
stationary everywhere. In order to compare Ghia’s results, the number of grid points used is 
257x257 or 66,049 and 65,536 cells are used in a quadrilateral mesh. For a triangular mesh, 
66,546 cells, which is approximately the same as quadrilateral mesh, and 33,618 grid points are 
used in our computation. Figure 7(a) shows the u-velocity profile along the horizontal center line 
for both present results with quadrilateral and triangular mesh and Ghia’s result. Both present 



results are in good agreement with Ghia’s result. It also shows the present solution is identical 
and is independent of mesh types. Figure 7(b) shows the u-velocity profile along the horizontal 
center line; again, the results match. Figure 8 presents the u- and v-velocity contours and 
streamline plot from the present results on the quadrilateral mesh. In Fig. 8(a), the u-velocity 
contours ranges from -0.33 to 1.0 with 100 intervals. Solid lines indicate the positive values and 
dashed lines negative ones. In Fig. 8(b), the v-velocity contours ranges from 0.64 to 0.31 with 
100 intervals. The flow structures including the location of the major vortex center, the bubble in 
the right bottom comer, and a small bubble in the left bottom comer are shown clearly in Figure 
8(c), and are in good agreement with the results of Ghia. 

3.3 Three-dimensional driven cavity flow 

The three-dimensional version of the preceding problem is also a standard case for a new flow 
solver. In 1987, Ku et al. [4] simulated three-dimensional flow in a cubic cavity by using 
pseudospectral methods to solve the Navier-Stokes equations for Re = 100, 400, and 1000. In 
2003, Shu et al. [5] repeated this problem by using the SIMPLE algorithm with the differential 
quadrature (DQ) method. They simulated the three-dimensional driven cavity flow at Re = 100, 
200, 400 and 1000 and compared their results with Ku’s results. In this section, we simulated this 
problem at Re = 400 and compared our results with those of Ku and Shu. In order to validate the 
results with different meshes, two grids are used to study this problem. One is a hexahedral mesh 
with 67x67x67 grid points and 287,496 cells, and the other is a tetrahedral mesh with 79,951 grid 
points and 446,953 cells. The geometry of this problem is a unit cube. The boundary condition at 
the y = 1 plane is uniform flow with u=l, v=0, and w=0, and all other boundary conditions are 
no-slip walls. The initial condition for the entire computational domain is stationary. Figure 9(a) 
shows the u-velocity profile at the horizontal centerline of the z = 0.5 plane for the present 
hexahedral and tetrahedral mesh results as well as those of Ku and Shu. Figure 9(b) shows the v- 
velocity profile at the vertical centerline of z = 0.5 for the same set of simulations. Both present 
simulations show essentially identical solutions, and both are in good agreement Ku and Shu. 
Figure 10 shows the velocity vectors and pressure contours taken from the present simulation 
using the hexahedral mesh on the z=0.5, x=0.5, and y=0.5 planes respectively. As before, for the 
pressure contours dashed lines mean negative values. The established flow structures including 
the locations of the vortex center and the positive and negative regions in each plane are clearly 
visible and in good agreement with Ku’s and Shu’s results. 

3.4 Two/Three-dimensional incompressible flow over a circular cylinder 

Flow over a circular cylinder is a standard unsteady test problem. Many two-dimensional 
numerical studies have been done on this flow. In 1 990, Rogers and Kwak [6] studied flow over 
a circular cylinder for Re = 200 using an upwind differencing scheme to solve the 
incompressible Navier-Stokes equations. They compared their results, which were obtained 
using 3rd order and 5th order upwind differencing schemes, with computational and 
experimental results. In 1995, Ku [7] studied this problem for Reynolds numbers 100, 250, 500, 
and 1,000 using a pseudospectral element method. Alonso et al. [8] studied the vortex shedding 
over a circular cylinder at Re = 500 with a multigrid unsteady Navier-Stokes solver with a flux- 
limited dissipation scheme in 1995. In 1998, Liu et al. [9] used preconditioned multigrid methods 
to study unsteady incompressible flows. They investigated flow over a circular cylinder at Re = 
200 and their results was in good agreement with other computational and experimental results. 


In 1999, Ronald et al. [10] reported their results for this problem at Re = 300 using the NASA 
FUN2D code. A final example of two-dimensional cylinder flow simulation is the work of Qian 
and Vezza [11] studied flow over a circular cylinder problem at Re = 1000 with a vorticity-based 
method in 2001. 

Examples of three-dimensional numerical studies of flow over a circular cylinder are the studies 
of Braza and Persillon [12] and Henderson [13]. There are also many experimental studies of 
flow over cylinder problems such as Roshko [14], Wille [15], and Williamson [16], [17]. 
According to Williamson [16], the phenomena of flow over a circular cylinder can be classified 
by Reynolds number. Williamson found the laminar vortex shedding region to occur for 
Reynolds numbers between 49 and 140-194. The three-dimensional wake transition region 
occurs for Re ~ 190 to 260. For the range between Re= 260 ~ 1000, the three-dimensional 
disorder of the wake begin to increase in at the fine scales. This evolves into the shear-layer 
transition regime for Reynolds numbers 1 ,000 up to 20,000. He also noted that three-dimensional 
effects occur for Re > 190. Further detailed explanations of the remaining regimes are reported in 
Williamson [16]. To date, most numerical studies about this problem only focused on a single 
Reynolds number and either two-dimensional or three-dimensional simulations exclusively. The 
current results encompass Reynolds numbers 100, 200, 300, 500, 1,000, and 1500 for two- 
dimensional simulations and 100 and 200 for three-dimensional simulations. These results are 
compared with the appropriate previous studies as well as current results cross-comparisons to 
examine the dimensional and flow regime effects. 

Figure 1 1 shows a schematic diagram for flow over a circular cylinder with dimensions and 
boundary conditions. For three-dimensional simulation, the span of the cylinder is 10D, with D 
representing the reference length equal to the diameter of the cylinder. The boundary conditions 
at z = 0 and z = -10D are periodic, eliminating end effects. The initial condition for the entire 
domain is uniform flow as inflow for all simulations and the time step is 0.005 for all cases. The 
grid for two-dimensional simulations is a quadrilateral mesh with 22705 cells and 22925 grid 
points; for three-dimensional simulation, a hexahedral mesh with 1.1 3M cells and 1.1 7M grid 
points is used. Both grids are densely distributed near the cylinder and wake region and coarser 
near the outer region. All of the two-dimensional simulations are performed on the KFC3a 
cluster with 16 nodes, and all three-dimensional simulations are performed on KFC2, a 48 node 
commodity cluster with AMD Athlon 2000+ CPU and a channel-bonded network. It is noted that 
each three-dimensional simulation took approximately one week with 32 nodes on KFC2. Figure 
12 presents the coefficient of Lift (C L ) and the coefficient of drag (C D ) unsteady histories of the 
present two-dimensional simulations for Re = 100, 200, 300, and 1,000 respectively. The 
Strouhal number (S t ) for these data sets is derived from the frequency of C L . In our simulations, 
higher Reynolds number cases achieve steady Strouhal numbers faster than lower Re cases, 
consistent with other computational results. Table 1 presents the summary of our present two- 
and three-dimensional results along with other computational and experimental results. As 
shown in Table 1, our current results show good agreement with previous data by comparing S h 
Cl, and C D . Fig. 13 shows the two-dimensional vorticity contours for Re = 100, 200, 300 and 
1000. In Fig. 13, the contours are range from -0.3 to 0.3. The three-dimensional co z contours from 
the current simulations are similar to those generated by the two dimensional test cases. As seen 
in Fig. 13, the vortex shedding frequency increases as Re increases. For Re = 100, the vortices 
decay in the downstream. Because of the limited computational domain, we do not see the vortex 



structures merge to large scale vortices in our simulation. For Re = 200, 500 and 1000, not only 
does the vortex strength decay, but also the vortex structures collapse in the far downstream and 
start to merge to large scale vortices. This same phenomenon is reported by Inoue and Yamazaki 
[18]. 

Figure 14(a)-(c) show the present three-dimensional results of co x , © y and © z for Re = 100 and 
(d)-(f) for Re = 200 in the y = 0 plane. The contours of © x and co y range from -0.0005 to 0.0005 
for Re = 100 and from -0.02 to 0.02 for Re = 200, and the contours of to z range from -0.3 to 0.3 
for both cases where bright regions correspond to positive values and dark regions correspond to 
the negative values. Obviously, the magnitude of co z is much larger than the magnitudes of © x 
and (o y . For Re = 100, we can see that the magnitude of co x decreases in the upstream and the 
local minimum appears at the position of 5th “roll” in Fig. 14(a). After that, the magnitude of © x 
begins to grow. In Fig. 14(b), the magnitude of © y grows after the flow past the cylinder. The 
thickness of the vortex “roll” increases downstream. From Fig. 14(c), the magnitude of co z decays 
after the flow past the cylinder due to the energy dissipation. Because the three-dimensional 
effects are very weak at Re = 200, the three-dimensional outcome is very similar to two- 
dimensional case. For Re = 200, as seen from Fig. 14(d), the magnitude of co x decreases in the 
beginning and the local minimum also appears near the 5th “roll”. In the far downstream, a 
transition zone is observed, after which the vortex scales transit from small to large scales. In 
Fig. 14(e), unlike Fig. 14(b), wavy vortex structures are observed. Figure 14(f) shows that co z 
decays after the flow past the cylinder. Figure 1 5 shows the iso-surface of vorticity magnitude 
for Re = 200 from the present three-dimensional simulation result. The flow structures, 
especially the vortex streets in the wake regions, observed in our present results are in good 
agreement with other simulation results. 


4 Conclusions 


A center pressure based method is presented in this paper, and which has been implemented 
successfully to a new two/three-dimensional parallel unstructured incompressible Navier-Stokes 
solver, UNCLE, which has been developed at University of Kentucky. Implementation of using 
different types of meshes with center pressure based method is feasible and straightforward. In 
order to increase flexibility in complex geometries, center pressure based method has been 
extended to use a variety of grid types, such as triangular, quadrilateral, tetrahedral, and 
hexahedral meshes. Mesh independent tests are also made to prove that current method can 
generate identical solutions for different mesh types. By using METIS for domain 
decomposition, excellent parallel load balance is achieved. In this paper, several test cases are 
presented — laminar incompressible flow past a flat plate, steady two/three-dimensional driven 
cavity flow, and unsteady two/three-dimensional flow over a circular cylinder for low Reynolds 
numbers. All these test cases yielded good agreements in comparison with previous 
computational or experimental results. A complete qualitative and quantitative study of two- 
dimensional flow over a circular cylinder for low Reynolds number is presented in tins paper, 
which can be further used as a benchmark solution set for the development of new unsteady 
Navier-stokes solvers. 
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Fig. 1 Schematic diagrams for integration areas, (a) convective fluxes, and (b) diffusive fluxes. 



Fig. 2 A schematic diagram of cell-centered partitioning approach 




Fig. 3(c) Partitioned tetrahedral mesh for 3D flow over a circular cylinder 



Fig. 4 Schematic diagram of laminar incompressible flow past flat plate 
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Fig. 5(a) Plot of t] versus uf U at x=4.5 with present results and Blasius solution 



Fig. 5(b) The plot of Ree versus c/with present results and Blasius solution 



Fig. 6 The schematic diagram of two-dimensional driven cavity flow with boundary conditions 
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Fig. 7(a) The u- velocity profile along the horizontal center line for present results and Ghia’s 
result. 



Fig. 7(b) The v- velocity profile along the vertical center line for present results and Ghia’s result. 
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Fig. 8(a) The u-velocity contour plot for two-dimensional driven cavity flow at Re=400. 
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Fig. 8(b) The v-velocity contour plot for two-dimensional driven cavity flow at Re=400. 
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Fig. 8(c) Streamline plot for two-dimensional driven cavity flow at Re=400. 
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Figure 9(a) The u-velocity profile at the horizontal centerline of z=0.5 plane with present results 
in hexahedral and tetrahedral meshes, and also Ku’s and Shu’s results. 
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Fig. 11 Schematic diagram of flow over a circular cylinder with dimensions and boundary 
conditions 








Fig. 12 Cl and Cd history plots of two-dimensional simulation for (a) Re=100, (b) Re=200, (c) 
Re=300, and (d) Re=1000. 



Fig. 13 Vorticity contours for two-dimensional simulation. (a)Re=100, (b)Re=200, (c)Re=300, 
and (d)Re=1000. 
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Fig. 14 Vorticity component contours for three-dimensional simulation at y=0 plane, where (a)- 
(c) co x , CD y , and co z for Re=100 and (d)-(f) co x , co y , and to z for Re=200. 



Fig. 15 The iso-surface of vorticity magnitude for Re = 200 from present three-dimensional 
simulation result. 





Table 1 Summary of present results and other computational and experimental results 


Re 

100 

200 

300 

500 

1000 

1500 

Present 
2D results 

C l =±0.314 
0=1. 325+0.008 
S,=0.165 

C l =±0.642 

C d =1.318±0.04 

S t =0.196 

C L =10.869 
C D =1. 35410.072 
S,=0.21 

0=±1.H5 
C D =1. 40610.1 19 
S,=0.224 

0=11-378 

0=1.48910.198 

S,=0.239 

0=11-553 

0=1-57510.247 

S t =0.246 

Present 
3D results 

C l =± 0.322 
C D = 1.327+0.009 
S t =0.164 

C l =± 0.664 
C D =1. 32410.042 
S t =0.195 





Computational 2D results 

Rogers and 
Kwak [6] 
(5 th order) 


C l =±0.65 
C D =1. 2310.05 
S,=0. 1 85 





Alonso et 
al. [7] 




0=± 1046 
C D =1 .217 
S t =0.224 



Ku [8] 

C l =±0.228 
C D =1. 33-1.358 
S, =0.1675 



O=±103 
C D = 12 12-1.481 
S, =0.2203 

C L “± 1.242 
C D = 1.1 87-1. 651 
S t =0.2326 


Liu et al. 
[9] 


C L =±0.69 
C D =1. 3 110.049 
S,=0.192 





Ronald et 
al. [10] 



0=10.841 

0=1.34 

S,=0.2036 




Qian and 
Vezza[ll] 





0=1-52 

S,=0.24 


Computational 3D results 

Braza and 
Persillon 
[12] 



S t =0.202 




im 


SrO.178 





Experimental results 

HESM 


S,=0. 1 9 



0=1-2 

S t =0.21 




C d =1.3 





ItmHM 

S,=0.166 


S|=0.203 




Williamson 

[17] 


S t =0.197 





















